Goto

Collaborating Authors

 Barstow


Expect the Unexpected: FailSafe Long Context QA for Finance

Kamble, Kiran, Russak, Melisa, Mozolevskyi, Dmytro, Ali, Muayad, Russak, Mateusz, AlShikh, Waseem

arXiv.org Artificial Intelligence

We propose a new long-context financial benchmark, FailSafeQA, designed to test the robustness and context-awareness of LLMs against six variations in human-interface interactions in LLM-based query-answer systems within finance. We concentrate on two case studies: Query Failure and Context Failure. In the Query Failure scenario, we perturb the original query to vary in domain expertise, completeness, and linguistic accuracy. In the Context Failure case, we simulate the uploads of degraded, irrelevant, and empty documents. We employ the LLM-as-a-Judge methodology with Qwen2.5-72B-Instruct and use fine-grained rating criteria to define and calculate Robustness, Context Grounding, and Compliance scores for 24 off-the-shelf models. The results suggest that although some models excel at mitigating input perturbations, they must balance robust answering with the ability to refrain from hallucinating. Notably, Palmyra-Fin-128k-Instruct, recognized as the most compliant model, maintained strong baseline performance but encountered challenges in sustaining robust predictions in 17% of test cases. On the other hand, the most robust model, OpenAI o3-mini, fabricated information in 41% of tested cases. The results demonstrate that even high-performing models have significant room for improvement and highlight the role of FailSafeQA as a tool for developing LLMs optimized for dependability in financial applications. The dataset is available at: https://huggingface.co/datasets/Writer/FailSafeQA


The Fake Fake-News Problem and the Truth About Misinformation

The New Yorker

Millions of people have watched Mike Hughes die. It happened on February 22, 2020, not far from Highway 247 near the Mojave Desert city of Barstow, California. A homemade rocket ship with Hughes strapped in it took off from a launching pad mounted on a truck. A trail of steam billowed behind the rocket as it swerved and then shot upward, a detached parachute unfurling ominously in its wake. In a video recorded by the journalist Justin Chapman, Hughes disappears into the sky, a dark pinpoint in a vast, uncaring blueness.


The NVIDIA PilotNet Experiments

Bojarski, Mariusz, Chen, Chenyi, Daw, Joyjit, Değirmenci, Alperen, Deri, Joya, Firner, Bernhard, Flepp, Beat, Gogri, Sachin, Hong, Jesse, Jackel, Lawrence, Jia, Zhenhua, Lee, BJ, Liu, Bo, Liu, Fei, Muller, Urs, Payne, Samuel, Prasad, Nischal Kota Nagendra, Provodin, Artem, Roach, John, Rvachov, Timur, Tadimeti, Neha, van Engelen, Jesper, Wen, Haiguang, Yang, Eric, Yang, Zongyi

arXiv.org Artificial Intelligence

Four years ago, an experimental system known as PilotNet became the first NVIDIA system to steer an autonomous car along a roadway. This system represents a departure from the classical approach for self-driving in which the process is manually decomposed into a series of modules, each performing a different task. In PilotNet, on the other hand, a single deep neural network (DNN) takes pixels as input and produces a desired vehicle trajectory as output; there are no distinct internal modules connected by human-designed interfaces. We believe that handcrafted interfaces ultimately limit performance by restricting information flow through the system and that a learned approach, in combination with other artificial intelligence systems that add redundancy, will lead to better overall performing systems. We continue to conduct research toward that goal. This document describes the PilotNet lane-keeping effort, carried out over the past five years by our NVIDIA PilotNet group in Holmdel, New Jersey. Here we present a snapshot of system status in mid-2020 and highlight some of the work done by the PilotNet group.


How Anthony Levandowski Put Himself at the Center of an Industry

#artificialintelligence

If federal prosecutors successfully prosecute Anthony Levandowski for 33 federal charges of theft and attempted theft of trade secrets, the self-driving engineer could face millions in fines and decades in prison. The accusations aren't new--they rehash the core of Waymo's civil case against Uber, which settled in February 2018--but their resurfacing in this format threatens to put a dismal end to a career remarkable for its range and variation. For nearly 20 years, the French-American Levandowski has played a kind of purposeful Forrest Gump for the world of autonomous driving. Rather than stumbling into the center of one momentous event after another, Levandowski has put himself there. And he has left a mixed trail in his wake: Former colleagues have described him as brilliant, engaging, motivating, fast-charging, inconsiderate, a weasel, and just plain evil.


Robots, start your engines

AITopics Original Links

AMONG government organisations, America's Defence Advanced Research Projects Agency (DARPA) has always been somewhat unusual. As the research arm of the Department of Defence, it is akin to a high-stakes venture capitalist, gambling large sums of money (its estimated 2004 budget is $3 billion) on risky technologies that will probably fail, but could pay off in a big way. It has had some stupendous successes, such as the internet, the Saturn rocket and micro-electro-mechanical systems (tiny machines that work at the scale of a human cell). There have also been some resounding duds, such as the Total Information Awareness project, a Big Brotherish plan to spot terrorists by combing through databases of personal information, which was swiftly abandoned. But what is arguably DARPA's most outlandish scheme yet will start rolling on March 13th, when a gaggle of strange-looking vehicles will line up in Barstow, California to make a wild run across 250 miles of scrub and desert.


Automaker sees automated freeway travel within 2 years

AITopics Original Links

Cars that can talk to each other and almost drive themselves at freeway speeds are just two years away from the showroom, according to General Motors executives. The company announced Sunday that the semi-autonomous system for freeways will be an option on an unidentified new 2017 Cadillac model that goes on sale in the summer of 2016. In addition, the 2017 Cadillac CTS will be equipped with radio transmitters and receivers that will let it communicate with other cars, sharing data such as location, speed and whether the driver is applying the brakes. The announcements were made Sunday at the opening of the Intelligent Transportation Society World Congress being held in Detroit this week. They are part of a barrage of similar declarations that are expected from other companies throughout the week as the industry shows off progress toward self-driving and safer cars.


A Personal Account of the Development of Stanley, the Robot That Won the DARPA Grand Challenge

Thrun, Sebastian

AI Magazine

This article is my personal account on the work at Stanford on Stanley, the winning robot in the DARPA Grand Challenge. Between July 2004 and October 2005, my then-postdoc Michael Montemerlo and I led a team of students, engineers, and professionals with the single vision of claiming one of the most prestigious trophies in the field of robotics: the DARPA Grand Challenge (DARPA 2004). The Grand Challenge, organized by the U.S. government, was unprecedented in the nation's history. It was the first time that the U.S. Congress had appropriated a cash price for advancing technological innovation. My team won this prize, competing with some 194 other teams. Stanley was the fastest of five robotic vehicles that, on October 8, 2005, successfully navigated a 131.6-mile-long course through California's Mojave Desert. This essay is not about the technology behind our success; for that I refer the interested reader to recent articles on the technical aspects of Stanley. Instead, this is my personal story of leading the Stanford Racing Team. It is the story of a team of people who built an autonomous robot in record time. It is also a success story for the field of artificial intelligence, as Stanley used some state of the art AI methods in areas such as probabilistic inference, machine learning, and computer vision. Of course, it is also the story of a step towards a technology that, one day, might fundamentally change our lives.